12. Problem Set

Problem Set 1 - Relative Probabilities

Relative Probabilities

In [2]:
# just a helper function for easier youtube call
def strip_url(url):
    return url.replace('https://youtu.be/','')

from IPython.display import YouTubeVideo
url = 'https://youtu.be/kmqfftmc6hE'
YouTubeVideo(strip_url(url))
Out[2]:

Ans

$$ \textstyle \begin{array}{l} \dfrac {p(Exactly\ One\ Head)}{p(First\ flip\ is\ Only\ Head)}\\ \\ \text {Total No of flips } = 4 \\ \\ \text {Total No of outcomes} = 2^4 = 16 \\ \\ \end{array} $$

It is tedious to write out all 16 outcomes or sketch a tree, so we could only focus on favourable outcomes (other outcomes only needed for total count)

Exactly One Head:

F1 F2 F3 F4
H T T T
T H T T
T T H T
T T T H

First Flip is Only Head:

F1 F2 F3 F4
H T T T

$$ \textstyle \therefore \dfrac { \dfrac {4}{16}}{ \dfrac {1}{16}} = 4 $$

Relative Probabilities 2

In [3]:
url = 'https://youtu.be/Ug-2cy_BNVo'
YouTubeVideo(strip_url(url))
Out[3]:

First Flip is Heads:

F1 F2 F3 F4
H H T T
H T H T
H T T H
H T T T
H T H H
H H T H
H H H T
H H H H

8 favourable outcomes for denominator. So

$$ \dfrac { \dfrac {4}{16}}{ \dfrac {8}{16}} = \dfrac {4}{8} = 0.5 $$

Same Coin

This was tough.

In [4]:
url = 'https://youtu.be/YQhStz2eHYY'
YouTubeVideo(strip_url(url))
Out[4]:
In [5]:
from graphviz import Digraph
from helpers import draw_samecoin_graph, update_samecoin_graph

g = Digraph()

g = draw_samecoin_graph(g)  # hardcoded for this problem now

# Root - Choosing between Fair or Lodged Coin
g = update_samecoin_graph(g, highlight_nodes=['Root'],color='#33EA4C:#A2E9FF')

# Fair Coin - Choosing between Person 1 (HHT) or Person 2 (THH)
g = update_samecoin_graph(g, highlight_nodes=['F'],color='#33EA4C:#A2E9FF')

# Fair Coin - HHT - Person 1 - Me - M
highlight_nodes = ['H1','H3','T7']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#33EA4C')

# Fair Coin - THH - Person 2 - You - Y
highlight_nodes = ['T1','H4','H9']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#A2E9FF')

# Loaded Coin - Choosing between Person 1 (HHT) or Person 2 (THH)
g = update_samecoin_graph(g, highlight_nodes=['L'],color='#33EA4C:#A2E9FF')

# Loaded Coin - HHT - Person 1 - Me - M
highlight_nodes = ['H2','H5','T11']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#33EA4C')

# Loaded Coin - THH - Person 2 - You - Y
highlight_nodes = ['T2','H6','H13']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#A2E9FF')

g
Out[5]:
%3 Root R F F Root->F 0.5 L L Root->L 0.5 H1 H F->H1 0.5 T1 T F->T1 0.5 H2 H L->H2 0.9 T2 T L->T2 0.1 H3 H H1->H3 0.5 T3 T H1->T3 0.5 H4 H T1->H4 0.5 T4 T T1->T4 0.5 H5 H H2->H5 0.9 T5 T H2->T5 0.1 H6 H T2->H6 0.9 T6 T T2->T6 0.1 H7 H H3->H7 0.5 T7 T H3->T7 0.5 H8 H T3->H8 0.5 T8 T T3->T8 0.5 H9 H H4->H9 0.5 T9 T H4->T9 0.5 H10 H T4->H10 0.5 T10 T T4->T10 0.5 H11 H H5->H11 0.9 T11 T H5->T11 0.1 H12 H T5->H12 0.9 T12 T T5->T12 0.1 H13 H H6->H13 0.9 T13 T H6->T13 0.1 H14 H T6->H14 0.9 T14 T T6->T14 0.1

Green flow indicates Person 1 picking up a Coin (Fair or Loaded) and flipping it thrice
Blue flow indicates Person 2 picking up a Coin (Fair or Loaded) and flipping it thrice

Person 1 always gets the order HHT so HHT characterizes Person 1 Person 2 always gets the order THH so THH characterizes Person 2

Let us denote Person 1 by M (Me), and Person 2 by Y (You)

Thus, $p(HHT)$ is same as $p(M)$ and $p(THH)$ is same as $p(Y)$. With this understanding, (closely follow above diagram for below calculations)

$$ \textstyle \begin{array}{l} p(\ F\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ F\ \cap\ HHT\ ) = p(\ F\ \cap\ M\ ) \\ \\ \text {Also} \\ \\ p(\ F\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ F\ )p(\ H\ )p(\ H\ )p(\ T\ ) = (0.5)^4 = 0.0625\\ \\ \therefore\ p(\ F\ \cap\ M\ ) = 0.0625 \\ \\ \text {Similarly} \\ \\ p(\ F\ \cap\ T\ \cap\ H\ \cap\ H\ ) = p(\ F\ )p(\ T\ )p(\ H\ )p(\ H\ ) = (0.5)^4 = 0.0625\\ \\ \therefore\ p(\ F\ \cap\ Y\ ) = 0.0625 \\ \\ p(\ L\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ L\ )p(\ H\ )p(\ H\ )p(\ T\ ) = (0.5)(0.9)^2(0.1) = 0.0405\\ \\ \therefore\ p(\ L\ \cap\ M\ ) = 0.0405 \\ \\ p(\ L\ \cap\ T\ \cap\ H\ \cap\ H\ ) = p(\ L\ )p(\ T\ )p(\ H\ )p(\ H\ ) = (0.5)(0.1)(0.9)^2 = 0.0405\\ \\ \therefore\ p(\ L\ \cap\ Y\ ) = 0.0405 \\ \\ \text {Summarizing..} \\ \\ p(\ F\ \cap\ M\ ) = 0.0625 \\ p(\ F\ \cap\ Y\ ) = 0.0625 \\ p(\ L\ \cap\ M\ ) = 0.0405 \\ p(\ L\ \cap\ Y\ ) = 0.0405 \\ \\ \end{array} $$
Applying Bayes Theorem, $$ \textstyle \begin{array}{l} p(\ F\ |\ M\ ) = \dfrac {p(\ F\ \cap\ M\ )}{\sum p(\ M\ )} = \dfrac {p(\ F\ \cap\ M\ )}{ p(\ F\ \cap\ M\ ) + p(\ L\ \cap\ M\ )} \\ \\ = \dfrac {0.0625}{0.0625 + 0.0405} = 0.6068 \\ \\ p(\ F\ |\ Y\ ) = \dfrac {p(\ F\ \cap\ Y\ )}{\sum p(\ Y\ )} = \dfrac {p(\ F\ \cap\ Y\ )}{ p(\ F\ \cap\ Y\ ) + p(\ L\ \cap\ Y\ )} \\ \\ = \dfrac {0.0625}{0.0625 + 0.0405} = 0.6068 \\ \\ p(\ L\ |\ M\ ) = \dfrac {p(\ L\ \cap\ M\ )}{\sum p(\ M\ )} = \dfrac {p(\ L\ \cap\ M\ )}{ p(\ L\ \cap\ M\ ) + p(\ L\ \cap\ M\ )} \\ \\ = \dfrac {0.0405}{0.0625 + 0.0405} = 0.3932 \\ \\ p(\ L\ |\ Y\ ) = \dfrac {p(\ L\ \cap\ Y\ )}{\sum p(\ Y\ )} = \dfrac {p(\ L\ \cap\ Y\ )}{ p(\ F\ \cap\ Y\ ) + p(\ L\ \cap\ Y\ )} \\ \\ = \dfrac {0.0405}{0.0625 + 0.0405} = 0.3932 \\ \\ \\ \text {Summarizing..} \\ \\ p(\ F\ |\ M\ ) = 0.6068 \\ p(\ F\ |\ Y\ ) = 0.6068 \\ p(\ L\ |\ M\ ) = 0.3932 \\ p(\ L\ |\ Y\ ) = 0.3932 \\ \end{array} $$

Question: What is the probability that we flipped coins of the same type?

There are 2 possible cases of we both having same type of coin, and they are independent of each other.

  1. Both have fair coins and initiate our flip chain - case A
  2. Both have loaded coins and initiate our flip chain - case B

Since 2 cases, probability of either case happening is simply their sum (as they are independent as well)

$$ p(\ A\ \cup\ B\ ) = p(\ A\ ) + p(\ B\ ) $$

$p(\ A\ ) $:

Both have fair coins and initiate our flip chain. That is,

  1. Given its me or HHT, coin being F, that is, $p(\ F\ |\ M\ )$ - case A1
  2. Given its you or THH, coin being F, that is, $p(\ F\ |\ Y\ )$ - case A2

We need both to be same type, in the sense, both case A1 and A2 should happen. Thus

$$ p(\ A\ ) = p(\ A1\ \cap A2\ ) $$

A1 and A2 are also independent to each other. We do not have influence on each other. So,..

$$ p(\ A\ ) = p(\ A1\ \cap A2\ ) = p(\ A1\ )p(\ A2\ ) = p(\ F\ |\ M\ )p(\ F\ |\ Y\ ) = (0.6068)(0.6068) = 0.3682 $$

$p(\ B\ ) $:

Both have loaded coins and initiate our flip chain. That is,

  1. Given its me or HHT, coin being L, that is, $p(\ L\ |\ M\ )$ - case B1
  2. Given its you or THH, coin being L, that is, $p(\ L\ |\ Y\ )$ - case B2

We need both to be same type, in the sense, both case B1 and B2 should happen. Thus

$$ p(\ B\ ) = p(\ B1\ \cap B2\ ) $$

B1 and B2 are also independent to each other. We do not have influence on each other. So,..

$$ p(\ B\ ) = p(\ B1\ \cap B2\ ) = p(\ B1\ )p(\ B2\ ) = p(\ L\ |\ M\ )p(\ L\ |\ Y\ ) = (0.3932)(0.3932) = 0.1546 $$

Finally

$$ p(\ A\ \cup\ B\ ) = p(\ A\ ) + p(\ B\ ) = 0.3682 + 0.1546 = 0.523 $$


Thus, probability that we both picked same type of coin is 0.523 or 52.3%

The next set of questions are complicated, so we will see in separate section next

Problem Set 2 - Many Flips

Many Flips

This was such a tricky question that it needed a deep dive in to the concept before trying to answer the questions.

Note: I have created this with emphasis on visualizing, so as to understand bigger picture better. However due to lack of time, I have not sufficiently modularized, please ignore the code, if its is daunting or unreadable, it is not essential and focus only on output as the target here is mainly to understand the concepts.

In [1]:
# just a helper function for easier youtube call
def strip_url(url):
    return url.replace('https://youtu.be/','')

from IPython.display import YouTubeVideo
url = 'https://youtu.be/MwcRx5uXICM'
YouTubeVideo(strip_url(url))
Out[1]:

3 Flips

Let us visualize and comprehend first the probability concept for 3 flips.

In [2]:
from graphviz import Digraph
from coinflipviz import draw_graph, get_combinations, get_combinations_consolidated, plot_combinations_consolidated

n_flips = 3

g = Digraph()
g = draw_graph(g, n_flips)
g


Out[2]:
%3 Root R H1 H Root->H1 p T1 T Root->T1 q H2 H H1->H2 p T2 T H1->T2 q H3 H T1->H3 p T3 T T1->T3 q H4 H H2->H4 p T4 T H2->T4 q H5 H T2->H5 p T5 T T2->T5 q H6 H H3->H6 p T6 T H3->T6 q H7 H T3->H7 p T7 T T3->T7 q

Each flow from Root to end, is a possibility of a sequence. For eg, on left most of tree, we have a sequence HHH, 3 heads in all 3 flips. This terminology is important. Each sequence may have any number of heads or nothing at all. Let us check, what are all the possible sequences from 3 flips and no of heads in each of those sequences.

In [3]:
combi_df = get_combinations(n_flips)
combi_df
Out[3]:
sequence x
0 HHH 3
1 HHT 2
2 HTH 2
3 HTT 1
4 THH 2
5 THT 1
6 TTH 1
7 TTT 0

As you might have guessed, the no of sequence depends on no of flips n.

$$ \text {No of sequences} = 2^n , \text {where n is no of flips} $$

Let us check, the statistics of how many heads occur in each sequence, and count them.

In [4]:
final_df = get_combinations_consolidated(n_flips)
final_df
Out[4]:
x n(x) p(x)
0 0 1 0.125
1 1 3 0.375
2 2 3 0.375
3 3 1 0.125

So for 3 flips, we have a distribution of no of heads in sequences, and thus their associated probability as well. If we try to graph them, we get something like below.

In [5]:
plot_combinations_consolidated(final_df)

It is important to understand the above graph very well. For instance, on LHS, when X=1, it means, there are 3 sequences with 1 head, and we could calculate its probability as $\dfrac {3}{2^3}=0.375$ which is shown on RHS. Note the nature of the graph. Let us try again for 4 flips.

4 Flips

In [6]:
n_flips = 4

g = Digraph()
g = draw_graph(g, n_flips)
g



Out[6]:
%3 Root R H1 H Root->H1 p T1 T Root->T1 q H2 H H1->H2 p T2 T H1->T2 q H3 H T1->H3 p T3 T T1->T3 q H4 H H2->H4 p T4 T H2->T4 q H5 H T2->H5 p T5 T T2->T5 q H6 H H3->H6 p T6 T H3->T6 q H7 H T3->H7 p T7 T T3->T7 q H8 H H4->H8 p T8 T H4->T8 q H9 H T4->H9 p T9 T T4->T9 q H10 H H5->H10 p T10 T H5->T10 q H11 H T5->H11 p T11 T T5->T11 q H12 H H6->H12 p T12 T H6->T12 q H13 H T6->H13 p T13 T T6->T13 q H14 H H7->H14 p T14 T H7->T14 q H15 H T7->H15 p T15 T T7->T15 q
In [7]:
combi_df = get_combinations(n_flips)
combi_df
Out[7]:
sequence x
0 HHHH 4
1 HHHT 3
2 HHTH 3
3 HHTT 2
4 HTHH 3
5 HTHT 2
6 HTTH 2
7 HTTT 1
8 THHH 3
9 THHT 2
10 THTH 2
11 THTT 1
12 TTHH 2
13 TTHT 1
14 TTTH 1
15 TTTT 0
In [8]:
final_df = get_combinations_consolidated(n_flips)
final_df
Out[8]:
x n(x) p(x)
0 0 1 0.0625
1 1 4 0.2500
2 2 6 0.3750
3 3 4 0.2500
4 4 1 0.0625
In [9]:
plot_combinations_consolidated(final_df)

Now you see, where the graph goes as n increases? This is binomial distribution. Let us put aside formula and definition for a while and only focus on the nature of these graphs.

What happens when flips increase?

Let us find out by a small animation

In [10]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()

def animate_combinations_consolidated(df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)

def animate(i):

    ax1.clear()
    ax2.clear()

    fig.suptitle('No of flips  :  {}'.format(i), fontsize=14)
    temp_df = get_combinations_consolidated(i)
    animate_combinations_consolidated(temp_df)


n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)


plt.tight_layout()
plt.subplots_adjust(wspace=0.5)    
from IPython.display import HTML
HTML(ani.to_html5_video())
Out[10]:
<matplotlib.figure.Figure at 0x2416a5b0630>
  1. Note that as we increase the no of flips, the graphs above approximate to a Guassian like structure.
  2. Maximum number of heads is always for half the number of flips. That is, if n = 10, then x = 5, that is sequence having 5 heads, has maximum probability. This is well observed, when n=even

The above graphical depictions would be very helpful to answer the quizes as we would see below.

Pascal's triangle

The numbers on the LHS, the n(x), interestingly are the same as in Pascal's triangle, where the row corresponds to the no of flips.

2018-07-20_17h11_37.png

These numbers are called binomial coefficients. That is, the No of sequences we get on LHS, are binomial co efficients. Pascal's triangle is made up of binomial co oefficients.


So for any number of flips x, one could of course look at the respective row in Pascal's triagle to get n(x) easily.

Quiz Answers

1. The Probability of every Sequence becomes Small - True or False?

Let us take an example. Say, no of flips = 4, Then

$$ \text {Total No of sequences} = 2^4 = 16 \\ \\ \text {Probability of any single sequence} = \dfrac {\text {No of that sequence}}{\text {Total No of sequences}} = \dfrac {1}{16} = 0.0625 $$

Generalizing, for any no of flips n, $$ \text {Total No of sequences} = 2^n \\ \\ \text {Probability of any single sequence} = \dfrac {\text {No of that sequence}}{\text {Total No of sequences}} = \dfrac {1}{2^n} $$

Obviously, as n increases, $\dfrac {1}{2^n}$ decreases.


Thus, as n increases, the probability of every Sequence becomes small. So its True

2. The probability of every number of heads becomes small - True or False?

Let no of flips = 4, Let us take specific number of heads, say, X = 2 (3 heads). Let us observe, what happens to X = 2, when n increases in below animation directly

In [11]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()

def animate_combinations_consolidated(df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)

    # highlight X = 2
    if len(rects) > 2:
        rects[2].set_color('r')

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)

def animate(i):

    ax1.clear()
    ax2.clear()

    fig.suptitle('No of flips  :  {}     Red  : (X=2)'.format(i), fontsize=14)
    temp_df = get_combinations_consolidated(i)
    animate_combinations_consolidated(temp_df)


n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)


plt.tight_layout()
plt.subplots_adjust(wspace=0.5)    
from IPython.display import HTML
HTML(ani.to_html5_video())
Out[11]:
<matplotlib.figure.Figure at 0x2416bbcc5c0>

So even though, initially it increased from 0.25 (for n=2), to 0.375 (n=3), thereafter it reduces and approaches to zero as can be seen on above animation.


Thus, as n increases, the probability of every number of heads become small. So its true

3. The probability of every proportion of heads becomes Small. True or False?

Let n = 4,

Let X = 50% of n.

Now this does not make sense for odd no of n, because then X would be non integer. For eg, if I have 5 flips, then X cannot be 2.5 heads! So we will check for only even no of flips.

Note:

Graph's y axis limits are maintained at constant, just to illustrate the reducing nature of probabilities. This had been the case all along, only that earlier, the y axis was auto scaling so reducing nature was only obvious from carefully observing numbers.

In [12]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()

def animate_combinations_consolidated(i, df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)        

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)

    # highlight X corresponding to 50% of no of flips
    if ((i) % 2) == 0:  # if only even, 
        y = int(i/2)    
        if len(rects) > 2:
            #print(y)
            rects[y].set_color('r')    

    # also set y limit to keep the scale constant to view the effect as n increases
    ymin = ax2.get_ylim()[0]
    ymax = 0.5*1.1  # increase space to insert bar value and assuming we will not cross 0.5 prob ever
    ax2.set_ylim([ymin,ymax])     
    ymin = ax1.get_ylim()[0]
    ymax = 260*1.1  # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
    ax1.set_ylim([ymin,ymax])         


def animate(i):

    ax1.clear()
    ax2.clear()

    fig.suptitle('No of flips  :  {}    Red X = 50% of No of flips'.format(i), fontsize=14)
    temp_df = get_combinations_consolidated(i)
    animate_combinations_consolidated(i, temp_df)


n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)


plt.tight_layout()
plt.subplots_adjust(wspace=0.5)    
from IPython.display import HTML
HTML(ani.to_html5_video())
Out[12]:
<matplotlib.figure.Figure at 0x2416a46fcf8>

From RHS graph, one could easily observe that the maximum probability that could occur (at $X = \dfrac {n}{2}$) for any n decreases, as n increases.

Since the max probability is reducing, it is also obvious, for any other X it would still reduce, which can also be observed from other adjacent blue bars from above RHS graph.


Thus, as n increases, X = any % of n decreases. So this is also True.

4. The probability of every range of proportions becomes small - True or False?

We will prove this by taking the exceptional case, where it is False. This is difficult to observe at first, as there are wide range of proportions possible, but once we understand the nature of the "Gaussian" like curve of probabilities, it would be easier to comprehend.

Note the phrase range of proportions. This means, from any % of n to any other % of n. The exceptional case we take up to argue its falseness is,

$X \leq 50\%$ of n

Suppose, n = 5, then $X \leq \dfrac {5}{2} = X \leq 2.5$, thus X = 0, 1, 2

We should then find

$$ p(X \leq 2.5) = p(X = 0) + p(X = 1) + p(X = 2) $$

Note also that, for n = even, we cannot accurately divide 50%, as a fair share of probability goes to middle X=50% case (just like we saw earlier), so for this case, we will take only X = odd to prove our point.

This partiality does not matter, as n tends to go infinity, we get a smooth guassian like density function, and we just take area for range of X, instead of individual bars as of now.

In [13]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()

def animate_combinations_consolidated(i, df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)        

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)

    # highlight X corresponding to 50% of no of flips
    if (i % 2) == 0:  
        pass
    else: # if only odd
        from math import floor
        i_50_range = int(i/2) 
        for each_bar in range(0, i_50_range+1):
            rects[each_bar].set_color('r')  
            ax2.text(-1,0.52,'  Red Area: 0.5', fontsize=12)

    # also set y limit to keep the scale constant to view the effect as n increases
    ymin = ax2.get_ylim()[0]
    ymax = 0.5*1.1  # increase space to insert bar value and assuming we will not cross 0.5 prob ever
    ax2.set_ylim([ymin,ymax])     
    ymin = ax1.get_ylim()[0]
    ymax = 260*1.1  # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
    ax1.set_ylim([ymin,ymax])         


def animate(i):

    ax1.clear()
    ax2.clear()

    fig.suptitle('No of flips  :  {}    Red X <= 50% of No of flips'.format(i), fontsize=14)
    temp_df = get_combinations_consolidated(i)
    animate_combinations_consolidated(i, temp_df)


n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)


plt.tight_layout()
plt.subplots_adjust(wspace=0.5)    
from IPython.display import HTML
HTML(ani.to_html5_video())
Out[13]:
<matplotlib.figure.Figure at 0x2416a7fdb00>

From RHS graph, pause for every odd no of flips, and try calculating sum of red bar values. It would always be 0.5.

$ n = 1,\ X \leq (1)(0.5),\ p(X \leq 0)\ = \ p(X = 0) \ = \ 0.5 \\ \\ n = 3,\ X \leq (3)(0.5), \ p(X \leq 1)\ = \ p(X = 0) \ + p(X = 1)\ = \ 0.125 + 0.375 = 0.5 \\ \\ n = 5,\ X \leq (5)(0.5), \ p(X \leq 2)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ = \ 0.03125+0.15625+0.31250 = 0.5 \\ \\ n = 7,\ X \leq (7)(0.5), \ p(X \leq 3)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ + p(X=3)\ = \ 0.00781+0.05468+0.16406+0.2734375 = 0.5 \\ \\ n = 9,\ X \leq (9)(0.5), \ p(X \leq 4)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ + p(X=3)\ + p(X=4)\ = \ 0.001953+0.017578125+0.0703125+0.1640625+0.24609375= 0.5 \\ \\ $

This is a typical charactersitic of such a distribution. This could be extended that, as n approaches $\infty$, the smooth probability curve we get, always have half of its area to be of 0.5 probability. This makes sense, after all, total probability (total area of the curve) should be 1, and it is symmetrical (as you can see how it is growing), so one could easily infer, half of it would be having 50% probability.


Thus, since the total probability remains constant at 0.5 for X $\leq$ 50% of n, not every range of proportions has probability reduces as n increases. So this statement is False

5. The probability of some ranges of proportions becomes small - True or False?

What if our range of proportions is itself, less than or more than 50%? Let us find out for 33% (to be inline with goldsong's explanation.

$X \leq 33\%$ of n

Earlier we chose odd X, because, we were able to find X the sum of which wholly satisfied our < 50% criteria. In this case, it is tricky. It would be evident, when we plot 33% of n values in a table.

In [14]:
import pandas as pd

cols = ['n_flips', 'n_33_range', 'floored_x', 'ceiled_x']

df = pd.DataFrame(columns=cols)

def yellow(val):
    color = 'yellow'
    return 'background-color: %s' % color

for i in range(1, 11):  # 10 flips
    n_flips = i
    n_33_range = i*0.33
    floored_x = int(i*0.33)
    from math import ceil
    ceiled_x = int(ceil(i*0.33))
    df = df.append({'n_flips' : n_flips, 'n_33_range' : n_33_range, 'floored_x' : floored_x, 'ceiled_x': ceiled_x}, ignore_index=True)

df[['n_flips','floored_x','ceiled_x']] = df[['n_flips','floored_x','ceiled_x']].astype(int)
df = df.astype(object).T  # ref: https://stackoverflow.com/questions/47907195/pandas-transpose-resets-decimal-rounding

#list(df.columns.values)
df.style.applymap(yellow, subset=pd.IndexSlice[:,[2,5,8]])
Out[14]:
0 1 2 3 4 5 6 7 8 9
n_flips 1 2 3 4 5 6 7 8 9 10
n_33_range 0.33 0.66 0.99 1.32 1.65 1.98 2.31 2.64 2.97 3.3
floored_x 0 0 0 1 1 1 2 2 2 3
ceiled_x 1 1 1 2 2 2 3 3 3 4

Observe that, when no of flips = 3, 6, 9, etc, as highlighted, we have 33% of n as a whole number (to observe equivalent no of heads).

For example,

When n_flips = 3, n_33_range = 0.99 $\cong$ 1, so X = 1 could almost wholly represent 33% of n_flips
When n_flips = 6, n_33_range = 1.98 $\cong$ 2, so X = 2 could almost wholly represent 33% of n_flips
When n_flips = 9, n_33_range = 2.97 $\cong$ 3, so X = 3 could almost wholly represent 33% of n_flips

However, for other values of n_flips, picking up floor or ceiled value would leave gaps introducing undesired errors that could screw up our observation.

For example,

When n_flips = 2, n_33_range = 0.66, so neither X = 0 or X = 1 is good approximation to be taken for observation.


Thus, in this case, we will calculate total area of 33% of n_flips, only when n_flips is at multiples of 3. If we have 10 flips, we could thus calculate at 3rd flip, 6th flip, 9th flip and observe if total area at those flips increase or decrease.

Also note, ceiled value of n_33_range, serves better than floored value unlike earlier cases.

Visualizing for 33%.. (n_flips increased to 12, to have one more multiple of 3 datapoint)

In [15]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()

def animate_combinations_consolidated(i, df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)        

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)


    # highlight X corresponding to 50% of no of flips
    if (i % 3) == 0:  # only if multiple of 3 as reasoned earlier

        from math import ceil
        i_33_range = int(ceil(i*0.33))  # ceiled value serves better in our case
        red_area = 0
        for each_bar in range(0, i_33_range+1):
            rect = rects[each_bar]
            rect.set_color('r')  
            red_area += rect.get_height()

        ax2.text(-1,0.52,'  Red Area : {}'.format(red_area), fontsize=12)

    # also set y limit to keep the scale constant to view the effect as n increases
    ymin = ax2.get_ylim()[0]
    ymax = 0.5*1.1  # increase space to insert bar value and assuming we will not cross 0.5 prob ever
    ax2.set_ylim([ymin,ymax])     
    ymin = ax1.get_ylim()[0]
    ymax = 950*1.1  # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
    ax1.set_ylim([ymin,ymax])         


def animate(i):

    ax1.clear()
    ax2.clear()

    fig.suptitle('No of flips  :  {}    Red X <= 33% of No of flips'.format(i), fontsize=14)
    temp_df = get_combinations_consolidated(i)
    animate_combinations_consolidated(i, temp_df)


n_flips = 12
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)


plt.tight_layout()
plt.subplots_adjust(wspace=0.5)    
from IPython.display import HTML
HTML(ani.to_html5_video())
Out[15]:
<matplotlib.figure.Figure at 0x2416a4da5f8>

Thus, you could observe from graph that,

$ \small { n = 3,\ X \leq (3)(0.33),\ p(X \leq 0.99)\ \cong p(X \leq 1)\ = p(X=0) + p(X=1) = 0.125+0.375 = 0.5 \\ \\ n = 6,\ X \leq (6)(0.33), \ p(X \leq 1.98)\ \cong p(X \leq 2)\ = p(X=0) + p(X=1) + p(X=2) = 0.01562+0.09375+0.234375 = 0.344 \\ \\ n = 9,\ X \leq (9)(0.33), \ p(X \leq 2.97)\ \cong p(X \leq 3)\ = p(X=0) + p(X=1) + p(X=2) +p(X=3) = \\ \\ 0.001953125 + 0.017578125 + 0.0703125 + 0.1640625 = 0.2539 \\ \\ n = 12,\ X \leq (12)(0.33), \ p(X \leq 3.96)\ \cong p(X \leq 4)\ = p(X=0) + p(X=1) + p(X=2) +p(X=3) + p(X=4) = \\ \\ \dfrac {1}{2^{12}} + \dfrac {12}{2^{12}} + \dfrac {66}{2^{12}} + \dfrac {220}{2^{12}} + \dfrac {495}{2^{12}} = 0.00024414062 + 0.0029296875 + 0.01611328125 + 0.053710937 + 0.12084960937 = 0.19384765574 } $

Thus, as you could see, for X $\leq$ 33% of n, the probability decreases as n increases.


So, yes, for some ranges, the probabilities decrease as n increases. So it is True.

Note:

Take care trying more number of flips with above snippets. They are not efficient, and thus time sensitive. The nature of our problem is, the numbers grow exponentially, so the PC might have for long time, even for n greater than 15.

Problem Set 3 - Many Flips Ctd

Many Flips - Ctd

Is it Fair 2

In [11]:
# just a helper function for easier youtube call
def strip_url(url):
    return url.replace('https://youtu.be/','')

from IPython.display import YouTubeVideo
url = 'https://youtu.be/Nq1p7MdxlBU'
YouTubeVideo(strip_url(url))
Out[11]:

These are thankfully pretty straight forward, so we could directly calculate using Bayes' theorem.

4 Heads and 0 Tails

This means, 4 flips, and sequence HHHH

So question is what is the probability of fair coin, given HHHH?

$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ HHHH) = \dfrac {p(Fair\ \cap\ HHHH)}{\sum p(HHHH\ )} = \dfrac {p(Fair\ \cap\ HHHH)}{p(Fair\ \cap\ HHHH) + p(Loaded\ \cap\ HHHH)} = \dfrac {(0.9)(0.5)^4}{(0.9)(0.5)^4 + (0.1)(0.9)^4} = 0.4615 $

In [12]:
p_F = 0.9
p_HF = 0.5

p_L = 0.1
p_HL = 0.9

# 4 flips, 4 Heads and 0 Tails
p_F_given_H_4 = p_F*(p_HF)**4/(p_F*(p_HF)**4 + p_L*(p_HL**4))
p_F_given_H_4
Out[12]:
0.4615952732644018

10 Heads and 0 Tails

Similar like above, this time to the power of 10

$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 10H) = \dfrac {p(Fair\ \cap\ 10H)}{\sum p(10H\ )} = \dfrac {p(Fair\ \cap\ 10H)}{p(Fair\ \cap\ 10H) + p(Loaded\ \cap\ 10H)} = \dfrac {(0.9)(0.5)^{10}}{(0.9)(0.5)^{10} + (0.1)(0.9)^{10}} = 0.02458 $

In [13]:
# 10 flips, 10 Heads and 0 Tails
p_F_given_H_10 = p_F*(p_HF)**10/(p_F*(p_HF)**10 + p_L*(p_HL**10))
p_F_given_H_10
Out[13]:
0.024587025215085937

20 Heads and 0 Tails

Similar like above, this time to the power of 20

$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 20H) = \dfrac {p(Fair\ \cap\ 20H)}{\sum p(20H\ )} = \dfrac {p(Fair\ \cap\ 20H)}{p(Fair\ \cap\ 20H) + p(Loaded\ \cap\ 20H)} = \dfrac {(0.9)(0.5)^{20}}{(0.9)(0.5)^{20} + (0.1)(0.9)^{20}} = 0.0007059 $

In [14]:
# 20 flips, 20 Heads and 0 Tails
p_F_given_H_20 = p_F*(p_HF)**20/(p_F*(p_HF)**20 + p_L*(p_HL**20))
p_F_given_H_20
Out[14]:
7.059301781108519e-05

0 Heads and 5 Tails

Similar like above, except that its tails now.

Implied No of flips = 5

$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 5T) = \dfrac {p(Fair\ \cap\ 5T)}{\sum p(5T\ )} = \dfrac {p(Fair\ \cap\ 5T)}{p(Fair\ \cap\ 5T) + p(Loaded\ \cap\ 5T)} = \dfrac {(0.9)(0.5)^{5}}{(0.9)(0.5)^{5} + (0.1)(0.1)^{5}} = 0.9999 \cong 1 $

Wow that is almost always probable. If you have 5 flips, fair coin having 5 T is almost certain?? Some one please double check this.

In [15]:
# 5 flips, 0 Heads and 5 Tails
p_TF = 0.5

p_TL = 0.1

p_F_given_T_5 = p_F*(p_TF**5)/( p_F*(p_TF**5) +  p_L*(p_TL**5))
p_F_given_T_5
Out[15]:
0.999964445708597

5 Flips, 2 Heads, 3 Tails

n = 5, X = 2 (heads), We could use Pascal's triangle, to find the no of sequences having 2 heads for 5 flip problem.

image.png

From Pascal's triangle, there are 10 sequences possible where, there are 2 Heads and 3 Tails.

For fair coin, each of those 10 sequences will have joint probability, $p(2H3T) = (0.5)^2(0.5)^3 = 0.5^5 = 0.03125 $

For loaded coin, each of those 10 sequences will have joint probability $p(2H3T) = (0.9)^2(0.1)^3 = 0.00081 $

Joint probability of coin being fair and having 2H3T for one sequence $ (0.9)(0.03125) = 0.028125 $

There are 10 such joint probabilities for fair coin path out of $2^5 = 32$ fair coin path outcomes

Joint probability of coin being loaded and having 2H3T for one sequence $ (0.1)(0.00081) = 0.000081 $

There are 10 such joint probabilities for loaded coin path out of $2^5 = 32$ loaded coin path outcomes

$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 2H3T) = \dfrac {p(Fair\ \cap\ 2H3T)}{\sum p(2H3T\ )} = \dfrac {p(Fair\ \cap\ 2H3T)}{p(Fair\ \cap\ 2H3T) + p(Loaded\ \cap\ 2H3T)} = \dfrac { 10(0.028125) }{ 10(0.028125) + 10(0.000081)} = 0.9971 \cong 1 $

In [16]:
0.028125/(0.028125 + 0.000081)
Out[16]:
0.9971282705807275

Thus, answer is (probabilities which are < 0.5)


4 H 0 T
10 H 0 T
20 H 0 T

Problem Set 4 - Visualizing Many Flips

Visualizing Many Flips

Comprehending Many Flips Problem

Problem Set 2 Probability section is a huge speed bump in Intro to Statistics - Udacity course, providing quizzes of high levels, which could be answered only with deeper knowledge of maths and also probability theory which is in strong contrast with flow of the course (which is irony because typical reader is new to it, that is why he is into the course in first place).

I have created below snippets to ease the pain. It provides functions to visualize and comprehend probability as no of flips increases. Hopefully it is useful to you.

Basic python is a pre requisite to understand the functions, but usage is simple and straight forward.

Visualize Probability Tree

In [1]:
from graphviz import Digraph

g = Digraph()

def draw_graph(g, n_flips=2):
    """
    Given no of flips, this function creates a corresponding probability tree
    """
    #g.attr(rankdir='LR', ranksep='0.5')
    g.attr('node', shape='circle', fontsize='10')
    g.attr('edge', fontsize='10')
    g.node('Root','R',style='filled', fillcolor='#DCDCDC')    # first node

    i_outcome = 1
    parent_list = []

    for each_flip in range(1, n_flips+1):
        n_outcomes = 2**each_flip

        temp_list = []
        p_index = 0 # parent index for each node
        for each_outcome in range(0, int(n_outcomes/2)):  

            # draw nodes, record parents
            new_H = 'H{}'.format(i_outcome) 
            new_T = 'T{}'.format(i_outcome)             
            g.node(new_H, 'H', style='filled', fillcolor='#FFFAAE')
            g.node(new_T, 'T', style='filled', fillcolor='#D2FFFF')                     
            i_outcome += 1
            parents = parent_list[-1] if len(parent_list) > 0 else []
            parent = parents[p_index] if len(parents) > 0 else None

            # debug
            #print('Flip:{} New H:{} New T:{} Parents:{} Parent Index:{}'.format(each_flip, new_H, new_T,list(parents), p_index))
            #print('Flip:{} New H:{} New T:{} Parent:{}'.format(each_flip, new_H, new_T,parent))

            # draw edges
            if parent is not None:
                g.edge(parent, new_H)
                g.edge(parent, new_T)
            else: 
                g.edge('Root', new_H)
                g.edge('Root', new_T)


            # for next set of H and T
            p_index += 1
            temp_list.append(new_H)
            temp_list.append(new_T)



        parent_list.append(temp_list)
        print()
    return g

draw_graph(g, n_flips=3)


Out[1]:
%3 Root R H1 H Root->H1 T1 T Root->T1 H2 H H1->H2 T2 T H1->T2 H3 H T1->H3 T3 T T1->T3 H4 H H2->H4 T4 T H2->T4 H5 H T2->H5 T5 T T2->T5 H6 H H3->H6 T6 T H3->T6 H7 H T3->H7 T7 T T3->T7

Final Outcomes for any number of flips

In [2]:
import pandas as pd

def get_combinations(n_flips=2):
    """
    Given the no of flips, this function will provide the final sequence combinations as a panda dataframe
    """
    # setup data frame with necessary cols
    columns = ['sequence', 'x']
    df = pd.DataFrame(columns=columns)

    # get the combinations
    from itertools import product
    for i in product(['H','T'], repeat=n_flips):     
        combi = ''.join(i)
        n_H = combi.count('H') # no of heads
        df = df.append({'sequence': combi, 'x': n_H }, ignore_index=True)

    # get no of heads in the combinations
    print('Given no of flips:', n_flips)
    print('\nx = no of heads in respective sequence')

    return df

combi_df = get_combinations(n_flips=3)
combi_df
Given no of flips: 3

x = no of heads in respective sequence
Out[2]:
sequence x
0 HHH 3
1 HHT 2
2 HTH 2
3 HTT 1
4 THH 2
5 THT 1
6 TTH 1
7 TTT 0

Characteristics of Probability of Sequences

In [3]:
def get_combinations_consolidated(n_flips=2):
    """
    Given the raw dataframe of combinations, this will provide n(x) and p(x)
    """
    # setup data frame with necessary cols
    columns = ['x', 'n(x)', 'p(x)']
    df = pd.DataFrame(columns=columns)

    # get raw data
    combi_df = get_combinations(n_flips=n_flips)
    x_list = combi_df['x'].tolist()

    # extract frequency
    #ref: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list/2162045
    x_list.sort()
    from itertools import groupby 
    freq_tuple = [ (key, len(list(group))) for key, group in groupby(x_list)]
    #print(freq_tuple)

    for each_freq_tuple in freq_tuple:
        x = each_freq_tuple[0]
        n_x = each_freq_tuple[1]
        p_x = n_x/(2**n_flips)  # its a conditional probability, thats y divided by total outcomes
        df = df.append({'x': x, 'n(x)': n_x, 'p(x)': p_x }, ignore_index=True)

    # convert cols to integer (except p(x))
    df[['x','n(x)']] = df[['x','n(x)']].astype(int) #ref: https://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas/21291622

    print('n(x) = total no of possible x type sequences')
    print('for eg, if x = 2, n(x) = 3, then there are 3 possible sequence types, in each of which, no of heads is 2')    

    print('\np(x) = conditional probability that n(x) could occur out of all outcomes')

    return df

final_df = get_combinations_consolidated(3)
final_df
Given no of flips: 3

x = no of heads in respective sequence
n(x) = total no of possible x type sequences
for eg, if x = 2, n(x) = 3, then there are 3 possible sequence types, in each of which, no of heads is 2

p(x) = conditional probability that n(x) could occur out of all outcomes
Out[3]:
x n(x) p(x)
0 0 1 0.125
1 1 3 0.375
2 2 3 0.375
3 3 1 0.125
In [4]:
import matplotlib.pyplot as plt

def autoformat(ax, xlabel, ylabel, fontsize):
    """
    Few tweaks for better graph
    """
    ax.set_xlabel(xlabel, fontsize=fontsize)
    ax.set_ylabel(ylabel, fontsize=fontsize)

    for label in (ax.get_xticklabels() + ax.get_yticklabels()):
        label.set_fontsize(fontsize) # Size here overrides font_prop

    ymin = ax.get_ylim()[0]
    ymax = ax.get_ylim()[1]*1.1  # increase space to insert bar value
    ax.set_ylim([ymin,ymax])

    # x values should be integers as its no of heads
    xmin = -1
    xmax = ax.get_xlim()[1]
    from math import ceil
    xmaxint = ceil(xmax)+1
    xint = range(xmin, xmaxint)
    ax.set_xticks(xint)    


def autolabel(ax, rects, fontsize):
    """
    Attach a text label above each bar displaying its height
    ref: https://matplotlib.org/2.0.2/examples/api/barchart_demo.html
    """    
    for rect in rects:
        height = rect.get_height()
        #text = '%.4f' % height
        text = '{0: <{width}}'.format(height, width=1) 
        ax.text(rect.get_x() + rect.get_width()/2., 1.005*height,text, ha='center', va='bottom', fontsize=fontsize+3, color='red')

def plot_combinations_consolidated(df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """
    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'No of Heads'
    ylabel = 'No of Sequences\nhaving those no of Heads'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)

    ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'   
    autoformat(ax2, xlabel, ylabel, fontsize)

    plt.tight_layout()
    plt.subplots_adjust(wspace=0.45)    
    plt.show()

plot_combinations_consolidated(final_df)
<matplotlib.figure.Figure at 0x1c2067a7f60>

Let us try again all for no of flips as 4

This time from library where we have moved these functions..

In [5]:
from coinflipviz import draw_graph, get_combinations, get_combinations_consolidated, plot_combinations_consolidated

n_flips = 4

g = Digraph()
g = draw_graph(g, n_flips)
g



Out[5]:
%3 Root R H1 H Root->H1 T1 T Root->T1 H2 H H1->H2 T2 T H1->T2 H3 H T1->H3 T3 T T1->T3 H4 H H2->H4 T4 T H2->T4 H5 H T2->H5 T5 T T2->T5 H6 H H3->H6 T6 T H3->T6 H7 H T3->H7 T7 T T3->T7 H8 H H4->H8 T8 T H4->T8 H9 H T4->H9 T9 T T4->T9 H10 H H5->H10 T10 T H5->T10 H11 H T5->H11 T11 T T5->T11 H12 H H6->H12 T12 T H6->T12 H13 H T6->H13 T13 T T6->T13 H14 H H7->H14 T14 T H7->T14 H15 H T7->H15 T15 T T7->T15
In [6]:
combi_df = get_combinations(n_flips)
combi_df
Out[6]:
sequence x
0 HHHH 4
1 HHHT 3
2 HHTH 3
3 HHTT 2
4 HTHH 3
5 HTHT 2
6 HTTH 2
7 HTTT 1
8 THHH 3
9 THHT 2
10 THTH 2
11 THTT 1
12 TTHH 2
13 TTHT 1
14 TTTH 1
15 TTTT 0
In [7]:
final_df = get_combinations_consolidated(n_flips)
final_df
Out[7]:
x n(x) p(x)
0 0 1 0.0625
1 1 4 0.2500
2 2 6 0.3750
3 3 4 0.2500
4 4 1 0.0625
In [8]:
plot_combinations_consolidated(final_df)

Problem Set 5 - Visualizing Many Flips - Ctd

Visualizing Many Flips - Ctd

Objective

We have already seen coin flips, and how they relate to Pascal's triangle, how they evolve in to binomial distribution earlier in Problem Set 2 - Probability section.

In 19A. Central Limit Theorem section, Sebastian ups the game with tossing a die with 3 outcomes (1,2,3). While we could follow pattern to calculate results, visualizing how that outcomes would be could be helpful to strengthen our insights. So I have created another set of functions that could help us do that.

Note: This section focuses only on programming part that helps us visualize the outputs when number of outcomes vary. I will have another section dedicated to 19A. Central Limit Theorem where I pull up these functions and illustrate the examples provided by Sebastian.

In [1]:
# just a helper function for easier youtube call
def strip_url(url):
    return url.replace('https://youtu.be/','')

from IPython.display import YouTubeVideo
url = 'https://youtu.be/jajxhNBnbmI'
YouTubeVideo(strip_url(url))
Out[1]:

Toss Definition

Suppose we have a dice, that produces 3 outcomes 0, 1, 2 when tossed. Suppose we toss it 2 times. The questions at hand are,

  1. What are the possible outcomes/sequences
  2. What are the number of occurences of sum of their individual outcomes?
  3. What are their probabilities (assuming equal probability for n outcomes as $\dfrac {1}{n}$)

What are the possible outcomes/sequences?

In [2]:
from graphviz import Digraph
import colorsys
import random

def HSVToHex(h, s, v):
    (r, g, b) = colorsys.hsv_to_rgb(h, s, v)
    hexy = "".join("%02X" % round(r*255)) + "".join("%02X" % round(g*255)) + "".join("%02X" % round(b*255))
    return hexy

def getDistinctColors(n):
    huePartition = 1.0 / (n + 1)
    saturation = 0.2
    lightness = 0.9
    colors_list = [HSVToHex(huePartition * value, saturation, lightness) for value in range(0, n)]
    return colors_list#, colors_comp_list

def draw_graph(g, n_outcomes = 2, n_flips=2):
    """
    Given no of flips, this function creates a corresponding probability tree
    For now, assuming outcomes are numbered
    """
    #g.attr(rankdir='LR', ranksep='0.5')
    g.attr('node', shape='circle', fontsize='10')
    g.attr('edge', fontsize='10')
    g.node('Root','R',style='filled', fillcolor='#DCDCDC')    # first node

    i_kids = 1
    parent_list = []

    # colors for each outcome
    colors_list = getDistinctColors(n_outcomes)
    #print(colors_list)

    # for each flip
    for each_flip in range(1, n_flips+1):
        per_flip_outcomes = n_outcomes**(each_flip)

        # for each node outcome of flip
        temp_list = []
        p_index = 0 # parent index for each node
        for each_outcome in range(0, int(per_flip_outcomes/n_outcomes)):


            # draw nodes, record parents
            for each_kid in range(0, n_outcomes):                

                new_kid_ID = '{}'.format(i_kids) # Id the kid
                new_kid_label = str(each_kid)
                g.node(new_kid_ID, new_kid_label, style='filled', fillcolor='#{}'.format(colors_list[each_kid]), fontcolor='black')

                parents = parent_list[-1] if len(parent_list) > 0 else []
                parent = parents[p_index] if len(parents) > 0 else None

                #debug
                #print('Flip:{} Outcome:{} Kid\'s Label:{} Possible ID:{} Parent:{}'.format(each_flip, each_outcome, each_kid, i_kids, parent))    

                i_kids += 1

                # draw edges
                if parent is not None:
                    g.edge(parent, new_kid_ID)
                else: 
                    g.edge('Root', new_kid_ID)

                # for next set of kids                    
                temp_list.append(new_kid_ID)

            p_index += 1

        parent_list.append(temp_list)
        print()

    return g


g = Digraph(strict=True)
g = draw_graph(g, n_outcomes = 3, n_flips=1)
g

Out[2]:
%3 Root R 1 0 Root->1 2 1 Root->2 3 2 Root->3
In [3]:
g = Digraph(strict=True)
g = draw_graph(g, n_outcomes = 3, n_flips=2)
g

Out[3]:
%3 Root R 1 0 Root->1 2 1 Root->2 3 2 Root->3 4 0 1->4 5 1 1->5 6 2 1->6 7 0 2->7 8 1 2->8 9 2 2->9 10 0 3->10 11 1 3->11 12 2 3->12

Tabular View

Let X denote the total sum of any sequence after each toss/flip.

In [4]:
import pandas as pd

def sum_digits(n):
    n = int(n)
    s = 0
    while n:
        s += n % 10
        n //= 10
    return s

def get_combinations(n_outcomes = 2, n_flips=2):
    """
    Given the no of flips, this function will provide the final sequence combinations as a panda dataframe
    """
    # setup data frame with necessary cols
    columns = ['sequence', 'x']
    df = pd.DataFrame(columns=columns)

    # generate the individual outcomes
    outcomes = list(range(0,n_outcomes))  # so if 3, its 0,1,2
    outcomes = [str(i) for i in outcomes]  # convert all to string

    # get the combinations
    from itertools import product
    for i in product(outcomes, repeat=n_flips):  
        combi = ''.join(i)
        summy = sum([int(x) for x in i])
        #print(i,combi, summy)
        df = df.append({'sequence': combi, 'x': summy }, ignore_index=True)

    # get no of heads in the combinations
    #print('Given no of flips:', n_flips)
    #print('\nx = no of heads in respective sequence')

    return df

get_combinations(n_outcomes = 3, n_flips=2)
Out[4]:
sequence x
0 00 0
1 01 1
2 02 2
3 10 1
4 11 2
5 12 3
6 20 2
7 21 3
8 22 4

What are the number of occurences of sum of their individual outcomes? And their probabilities?

Now let us take the number of occurances for different values of X. For eg, n(X=2) implies, the number of sequences having sum of individual outcomes being 2

In [5]:
def get_combinations_consolidated(n_outcomes = 2, n_flips=2):
    """
    Given the raw dataframe of combinations, this will provide n(x) and p(x)
    """
    # setup data frame with necessary cols
    columns = ['x', 'n(x)', 'p(x)']
    df = pd.DataFrame(columns=columns)

    # get raw data
    combi_df = get_combinations(n_outcomes=n_outcomes, n_flips=n_flips)
    x_list = combi_df['x'].tolist()

    # extract frequency
    #ref: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list/2162045
    x_list.sort()
    from itertools import groupby 
    freq_tuple = [ (key, len(list(group))) for key, group in groupby(x_list)]
    #print(freq_tuple)

    for each_freq_tuple in freq_tuple:
        x = each_freq_tuple[0]
        n_x = each_freq_tuple[1]
        p_x = n_x/(n_outcomes**n_flips)  # its a conditional probability, thats y divided by total outcomes
        df = df.append({'x': x, 'n(x)': n_x, 'p(x)': p_x }, ignore_index=True)

    # convert cols to integer (except p(x))
    df[['x','n(x)']] = df[['x','n(x)']].astype(int) #ref: https://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas/21291622

    #print('n(x) = total no of possible x type sequences')
    #print('for eg, if x = 2, n(x) = 3, then there are 3 possible sequence types, in each of which, no of heads is 2')    

    #print('\np(x) = conditional probability that n(x) could occur out of all outcomes')

    return df

final_df = get_combinations_consolidated(n_outcomes = 3, n_flips=2)
final_df
Out[5]:
x n(x) p(x)
0 0 1 0.111111
1 1 2 0.222222
2 2 3 0.333333
3 3 2 0.222222
4 4 1 0.111111
In [7]:
import matplotlib.pyplot as plt

def autoformat(ax, xlabel, ylabel, fontsize):
    """
    Few tweaks for better graph
    """
    ax.set_xlabel(xlabel, fontsize=fontsize)
    ax.set_ylabel(ylabel, fontsize=fontsize)

    for label in (ax.get_xticklabels() + ax.get_yticklabels()):
        label.set_fontsize(fontsize) # Size here overrides font_prop

    ymin = ax.get_ylim()[0]
    ymax = ax.get_ylim()[1]*1.1  # increase space to insert bar value
    ax.set_ylim([ymin,ymax])    

    # x values should be integers as its no of heads
    xmin = -1
    xmax = ax.get_xlim()[1]
    from math import ceil
    xmaxint = ceil(xmax)+1
    xint = range(xmin, xmaxint)
    ax.set_xticks(xint)


def autolabel(ax, rects, fontsize):
    """
    Attach a text label above each bar displaying its height
    ref: https://matplotlib.org/2.0.2/examples/api/barchart_demo.html
    """    
    for rect in rects:
        height = rect.get_height()
        #text = '%.4f' % height
        text = '{0: <{width}}'.format(height, width=1) 
        ax.text(rect.get_x() + rect.get_width()/2., 1.005*height,text, ha='center', va='bottom', fontsize=fontsize+3, color='red')

def plot_combinations_consolidated(df, fontsize=10):
    """
    Given the dataframe with x, n(x), p(x) this provides two plots:
    x vs n(x)
    x vs p(x)
    """
    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))

    X = df['x'].tolist()
    N = df['n(x)'].tolist()
    P = df['p(x)'].tolist()

    rects = ax1.bar(X, N)
    autolabel(ax1, rects, fontsize)

    xlabel = 'Sum of each outcome in a Sequence'
    ylabel = 'No of Sequences\nhaving that Sum'   
    autoformat(ax1, xlabel, ylabel, fontsize)


    rects = ax2.bar(X, P)
    autolabel(ax2, rects, fontsize)

    ylabel = 'Probability of ANY of Sequences\nhaving that Sum'   
    autoformat(ax2, xlabel, ylabel, fontsize)

    plt.tight_layout()
    plt.subplots_adjust(wspace=0.45)    
    plt.show()

plot_combinations_consolidated(final_df)

Tests..

In [12]:
g = Digraph(strict=True)
n_outcomes = 3
n_flips = 2
g = draw_graph(g, n_outcomes = n_outcomes, n_flips=n_flips)
g

Out[12]:
%3 Root R 1 0 Root->1 2 1 Root->2 3 2 Root->3 4 0 1->4 5 1 1->5 6 2 1->6 7 0 2->7 8 1 2->8 9 2 2->9 10 0 3->10 11 1 3->11 12 2 3->12
In [13]:
final_df = get_combinations_consolidated(n_outcomes = n_outcomes, n_flips=n_flips)
plot_combinations_consolidated(final_df)